Importance Sampling for Fair Policy Selection

نویسندگان

  • Shayan Doroudi
  • Philip S. Thomas
  • Emma Brunskill
چکیده

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair—they can select the worse of two policies more often than not. We give two examples where the unfairness of importance sampling could be practically concerning. We then present sufficient conditions to theoretically guarantee fairness and a related notion of safety. Finally, we provide a practical importance sampling-based estimator to help mitigate one of the systematic sources of unfairness resulting from using importance sampling for policy selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tradeoff Negotiation: The Importance of Getting in the Game; Comment on “Swiss-CHAT: Citizens Discuss Priorities for Swiss Health Insurance Coverage”

Swiss-CHAT’s playful approach to public rationing can be considered in terms of deliberative process design as well as in terms of health policy. The process’ forced negotiation of trade-offs exposed unexamined driving questions, and challenged prevalent presumptions about health care demand and about conditions of public reasoning that enable transparent rationing. While the experiment provide...

متن کامل

Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation

Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively expensive. A common approach is to use importance sampling techniques for compensating for the bias caused by the difference between data-sampling policies and the target policy. However, existing o...

متن کامل

Optical simulation of a Popescu-Rohrlich Box

It is well known that the fair-sampling loophole in Bell test opened by the selection of the state to be measured can lead to post-quantum correlations. In this paper, we make the selection of the results after measurement, which opens the fair- sampling loophole too, and thus can lead to post-quantum correlations. This kind of result-selection loophole can be realized by pre- and post-selectio...

متن کامل

Incorporating Cost-Effectiveness Data in a Fair Process for Priority Setting Efforts; Comment on “Use of Cost-Effectiveness Data in Priority Setting Decisions: Experiences from the National Guidelines for Heart Diseases in Sweden”

Cost-effectiveness data is useful for use in priority setting decisions in order to improve the efficiency of resources used. This paper thereby responds to Eckard et al. which addressed the use of cost-effectiveness data in the actual prioritization decisions in the Swedish national clinical guidelines for heart diseases. Based on a set of experiences on the use of economic evaluation in prior...

متن کامل

Residents’ Satisfaction with Adequacy of Facilities in Metropolitan Ibadan, Nigeria

The study examined the quantity and quality of infrastructure in Ibadan, Nigeria with a view to using information to providing policy guidelines for sustainable infrastructural development. Using stratified sampling technique, a total of fifteen wards from the five local government areas in Ibadan metropolis were selected for study. The selection of all the local government areas is based on th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017